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Abstract 

Genetic modification of plants may result in unintended effects causing poten- 
tially adverse effects on the environment. A comparative safety assessment is 
therefore required by authorities, such as the European Food Safety Authority, 
in which the genetically modified plant is compared with its conventional coun- 
terpart. Part of the environmental risk assessment is a comparative field experi- 
ment in which the effect on non-target organisms is compared. Statistical 
analysis of such trials come in two flavors: difference testing and equivalence 
testing. It is important to know the statistical properties of these, for example, 
the power to detect environmental change of a given magnitude, before the 
start of an experiment. Such prospective power analysis can best be studied by 
means of a statistical simulation model. This paper describes a general frame- 
work for simulating data typically encountered in environmental risk assess- 
ment of genetically modified plants. The simulation model, available as 
Supplementary Material, can be used to generate count data having different 
statistical distributions possibly with excess-zeros. In addition the model 
employs completely randomized or randomized block experiments, can be used 
to simulate single or multiple trials across environments, enables genotype by 
environment interaction by adding random variety effects, and finally includes 
repeated measures in time following a constant, linear or quadratic pattern in 
time possibly with some form of autocorrelation. The model also allows to add 
a set of reference varieties to the GM plants and its comparator to assess the 
natural variation which can then be used to set limits of concern for equiva- 
lence testing. The different count distributions are described in some detail and 
some examples of how to use the simulation model to study various aspects, 
including a prospective power analysis, are provided. 



Introduction 

An essential element in the environmental risk assessment 
(ERA) of genetically modified (GM) plants is a compara- 
tive field trial in which the effect on non-target organisms 
(NTO), such as aphids, beetles and bumble bees is com- 
pared. Such an experiment ensures that the GM plant 
and its comparator(s) are grown under the same manage- 
ment and environmental conditions, thus enabling a fair 
and objective comparison. A basic statistical approach for 
designing and analyzing such field experiments has been 



outlined in an EFSA guidance document (EFSA, 2010), in 
Perry et al. (2009) and Semenov et al. (2013). However, 
in practice the power of these experiments to detect envi- 
ronmental changes of a given magnitude is often 
unknown, partly because insufficient prior thought is 
given to what exact endpoints the experiments are sup- 
posed to test, and partly because the complex nature of 
ecological data complicates the power calculations. One 
of the aims in the EU-funded project "Assessing and 
Monitoring the Impacts of Genetically modified plants 
on Agro-ecosystems" (AMIGA) is to devise statistically 
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statistical Simulation Model for ERA 

well-based protocols for the design and analysis of field 
trials. To prepare this action an inventory was made of 
existing field studies in the literature, and a statistical 
simulation model was developed to mimic ecological data 
such as found in practice. The aim of the current paper is 
to describe this statistical simulation model and to show 
how this can be used in the design of field experiments. 

Data collection in experimental fields with genetically 
modified crops has been conducted for many years and a 
large variability of experimental designs, sampling tech- 
niques, guilds of non-target arthropods and statistical 
methods have been used (e.g., Marvier et al., 2007). To 
summarize the different approaches presented in the sci- 
entific literature, a non- exhaustive inventory of 33 field 
studies with insect resistance transgenic plants was com- 
piled among those firstly published, where the detection 
of possible effects of GM plants on natural enemies was 
the primary goal of the study (Table 1). The papers were 
published from 1992 until 2005. This time-span was cho- 
sen to include the very first published experiments of this 
kind, and also to incorporate the first available data from 
surveys in GM commercial fields. Different crops were 
included in the selection. The table presents some of the 
indicators relevant to the experimental design, collection 
methods and statistical analyses performed on the data. 
None of the papers provided a prospective power analysis 
for the experiments described. 

Field trials are thus diverse, but an example shows 
some typical elements. Al-Deeb and Wilde (2003) 
describe experiments to test the effects of the Cry3Bbl 
toxin in Bt corn on aboveground non-target arthropods. 
The experiments were performed on eight locations in 
one year and three locations in a second year, and they 
involved three GM varieties and two isolines in combina- 
tion with up to nine different seed and spraying treat- 
ments. Randomized complete block designs were used 
with 2-4 blocks and 8-40 plots. Visual inspection pro- 
vided count data on 15-20 plants per plot. Average 
counts per plant for five NTOs varied between 0 and 70. 
PitfaU trap count data observed at 3-7 time points were 
reported as average numbers per pitfall trap between 0 
and 616 for eight NTOs. Based on a statistical analysis 
using analysis of variance the authors concluded that no 
significant differences in numbers were detected between 
Bt corn and its non-Bt isoline. However, there is no 
mention of the effect sizes that these experiments would 
have been able to detect with a reasonable statistical 
power. In fact, the data provided are insufficient to draw 
any conclusion on the statistical power of the performed 
experiments, and this is also the case for many other 
reported studies. Indeed, in a few cases the importance of 
such an analysis had been singled out (Andow, 2003) and 
attempts to design field experiments on such bases were 
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done in rare cases (e.g.. Squire et al., 2003; Duan et al, 
2006). To improve this situation the EFSA guidance asks 
for prospective power analyses to be performed. This 
issue is further developed in the present paper. 

Typical data in environmental risk assessment of GM 
plants are counts or presence/absence data of NTOs. The 
basic distribution for counts is the Poisson distribution, 
whQe presence/absence data can usually be modeled by a 
binomial distribution. Clumping of individuals might give 
rise to an overdispersed distribution such as the negative 
binomial for counts and the beta-binomial distribution 
for presence/absence data. Also the number of zero obser- 
vations can be larger than predicted by the distribution 
and this gives rise to so-called excess-zero distributions. 
In many experiments, NTOs are sampled at different 
points in time, for example weekly, for all experimental 
units. The data are thus repeated measurements probably 
with some form of autocorrelation across time within 
experimental units. Depending on the species various pat- 
terns across time are possible. Moreover experiments are 
frequently repeated on different locations and in different 
years. 

The statistical analysis of ERA field trials comes in two 
flavors: difference testing and equivalence testing (van der 
Voet et al, 2011). The aim of the difference test is to 
reject the null hypothesis of no difference between the 
GM plant and its comparator. A significant difference test 
is then a "proof of difference", but this does not state that 
the difference is biologically relevant and constitutes a 
true hazard to the environment. Poorly designed experi- 
ments with low levels of replication may have low statisti- 
cal power of finding a true difference. So the absence of a 
significant difference is not a proof that there is no differ- 
ence, or "absence of evidence is not evidence of absence" 
(Altman and Bland, 1995). An equivalence test on the 
other hand employs a null hypothesis of non-equivalence, 
that is, that the difference between the GM plant and its 
comparator is larger than some pre-described equivalence 
limit, also called limit of concern (LOG). Rejection of the 
non-equivalence hypothesis implies that the difference is 
smaller than the LOG and this can be regarded as a 
"proof of safety". The advantage of equivalence testing is 
therefore that the onus is placed back on to those who 
wish to demonstrate the safety of GMOs to do high qual- 
ity, weU-replicated experiments with sufficient statistical 
power (Perry et al, 2009). Note that both the difference 
and equivalence test can be implemented by constructing 
a single confidence interval for the difference between the 
GM plant and its comparator. This employs the two one- 
sided tests (TOST) approach of Schuirmann (1987) for 
equivalence testing. 

It is important to know the statistical properties of dif- 
ference and equivalence tests, for example the power and 
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robustness of a test and whether the test has the assumed 
significance level. Such properties are well-known for sin- 
gle experiments using tests based on the normal distribu- 
tion, such as f-tests. For non-normal distributions, small 
sample properties of difference and equivalence tests are 
not straightforward. A simulation approach for sample 
size calculations for a difference test is employed by many 
authors, for example, Shieh (2001) and Hrdlickova (2006) 
for the Poisson distribution, Shieh (2001) and Demidenko 
(2008) for the binomial distribution, Aban et al. (2009) 
and Friede and Schmidli (2010) for the negative binomial 
distribution. A general practical approach to computing 
power for non-normal distributions is given by Lyles 
et al. (2007). However field testing of environmental 
effects of GM plants on NTOs is much more complicated 
as it may not only involve non-normal distributions, 
potentially with excess-zeros, but also a set of reference 
varieties in addition to the GM plant and its comparator, 
randomized blocks within an experiment, multiple experi- 
ments across different sites and/or years with possibly 
genotype by environment interaction, and finally repeated 
measures in time exhibiting some pattern in time possibly 
with autocorrelation. The object of this paper is to for- 
malize all these elements in a single statistical simulation 
model which provides a framework for studying various 
statistical approaches for data analysis of such experi- 
ments. The simulation model was implemented in a user- 
friendly C# program, using the R package (R Core Team, 
2012) for simulating from various distributions. The soft- 
ware is available as Supplementary Material to this paper. 

This paper first summarizes potentially useful statistical 
distributions for ecological data. Then the other elements 
of the statistical simulation model are described, namely 
block effects, additional varieties, repeated measurements 
and multiple trials. Some applications of the simulation 
model for power analysis are described, and possibilities 
for use and future research needs are discussed. 

Statistical Distributions for Counts 

The basic distribution for counts is the Poisson distribu- 
tion. The Poisson distribution arises when events occur 
independently of each other but at a fixed rate in time or 
space. The number of events in a fixed time- or space- 
interval then follows a Poisson distribution. The theoreti- 
cal variance of the Poisson distribution equals the mean 
II. Examples of three Poisson distributions are given in 
Figure 1. The Poisson distribution assumes a fixed rate of 
events in time or space. However frequently this rate 
might vary in different time- or space-intervals. A com- 
mon way to model this is to assume inter-subject vari- 
ability, also called mixing. It is then assumed that a count 
X follows a Poisson distribution with mean Z, where Z 
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itself is a random variable with mean ii and variance say 
T^. The marginal mean of the distribution of X is then 
given by /( and the variance equals fi + x^. Consequently 
the resulting distribution has a variance which is larger 
than the mean and this is termed over-dispersion. Three 
common ways to specify the mixing distribution of Z 
result in the overdispersed Poisson distribution, the nega- 
tive binomial distribution and the Poisson-Lognormal 
distribution. These are described below. 

The overdispersed Poisson distribution arises when Z 
follows a gamma distribution with variance = ((/»— 1) li 
which is proportional to the mean ji. The resulting distri- 
bution is a special form of the negative binomial distribu- 
tion, see McCuUagh and Nelder (1989), with variance 0/i 
which is also proportional to the mean. Figure 1 shows 
some examples of the overdispersed Poisson distribution. 
The so-called quasi likelihood approach is commonly 
used to fit this distribution. This employs the Poisson 
likelihood, estimates the dispersion parameter (/) by means 
of Pearson Chi-squared statistic and adjusts standard 
errors of estimates accordingly (McCuUagh and Nelder, 
1989). 

The negative binomial distribution arises when the 
mixing distribution Z follows a gamma distribution with 
mean /.( and variance ojyi^. The marginal mean of X is 
then again ^ and the variance equals /.( + ca^C^. Some 
examples of the negative binomial distribution are given 
in Figure 1. This shows that the negative binomial distri- 
bution, with a large dispersion parameter m, has a large 
zero probability and a rather flat tail. 

Specification of a lognormal distribution for Z, with 
say mean X and variance a^, results in the so-called Pois- 
son-Lognormal distribution. This is equivalent to the 
introduction of a normally distributed random effect on 
the scale of the linear predictor in Poisson regression, see 
Breslow (1984). The mean of the marginal distribution is 
given hy 11= exp(i + \o^) and the variance by /.( -I- [exp 
((7'^)— l)]/i^. This is thus the same variance function as 
the negative binomial distribution with co = [exp(cr^)— 1]. 
Despite this the distributions can be quite different for 
large /,( and m as is shown in Figure 1. 

A different approach was introduced by Taylor (1961) 
who proposed the power relationship V = a^/ between 
the variance V and the mean ^ for field population 
counts. This pioneering paper was followed by a series of 
papers, notably Taylor et al. (1978, 1980), in which it was 
shown that this relationship fitted well for many species, 
with varying values of a and j6 depending on the species 
at hand. The relationship was subsequently termed 
Taylor's power law by some authors. Perry et al. (2003) 
and Clark et al. (2006) advocate the use of the power law 
for analyses of count data obtained in farm scale evalua- 
tions of GM herbicide-tolerant crops. They found that 
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Figure 1. Examples of probabilities of 
statistical distributions for counts for means 
/I = 1, 4, and 10. The variance of the 
overdispersed Poisson distribution equals 
The variance of the negative binomial and 
Poisson-lognormal equals /i + aj/i^. 



median values of the power f!, when considering groups 
of indicator species, all fall between 1.5 and 2.0, averaging 
1.7 overall. There is no statistical distribution associated 
with Taylor's power law, as it only specifies a relationship 
between the variance and the mean. Perry et al. (2003) 
used the negative binomial distribution to simulate 
according to Taylor's power law by solving m from 
fi + mf/ = dj^i^ for given values of /i, a and /J. Using the 
negative binomial is however arbitrary, as, for example, 
the Poisson-Lognormal has the same variance to mean 
relationship, but has a different distribution. 

Statistical Distributions for Presence/ 
Absence Data 

In field experiments the presence or absence of an organ- 
ism might be recorded rather than counting the organism. 
The response X might then be the number of plants on 
which a specific organism is present for each experimental 
unit. Assuming independence between the n plants and a 
fixed presence probability n the response follows a bino- 
mial distribution. The mean of the binomial distribution is 
given by nn and the variance equals nn(l — n). Examples 
of the binomial distribution are given in Figure 2. 



Overdispersion arises by assuming that the presence 
probability itself now called Z, follows some statistical 
distribution with mean n and some variance t^. It follows 
that the marginal mean of X itself equals nn and the vari- 
ance equals nn{l — n) + n{n — 1)t^ which is larger than 
the variance of the binomial distribution. A popular 
choice for Z is the beta distribution which results in the 
so-called beta-binomial distribution. An alternative is to 
assume that the logit transform of Z follows a normal dis- 
tribution. Details of both distributions are given below. 

The beta-binomial distribution arises when Z follows a 
Beta distribution which is defined on the interval (0,1). A 
convenient re-parameterization results in a mean nn and 
variance nn{l — n)[l + (n — l)<p]. When the number of 
binomial trials is equal across experimental units, the term 
between squared brackets is constant and the variance of 
the beta-binomial distribution is then proportional to the 
binomial variance. In this case data can be easily analyzed 
by the quasi likelihood approach, similar to the analyses 
for the overdispersed Poisson distribution. Some examples 
of the beta-binomial distribution are given in Figure 2. 
This shows that the range of possible outcomes is 
extended as compared to the binomial distribution. How- 
ever for very large values of oj the distribution becomes 
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Figure 2. Examples of probabilities of 
statistical distributions for presence/absence 
data for n = 16 witin mean nn and variance 
conn (1 - Tt). For tine binomial distribution, 
CO = 1. 
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bath-tub like with large probabilities for outcomes 0 and n 
and small probabilities for intermediate values. 

An alternative is to assume that Z follows a logit-nor- 
mal distribution. This is equivalent to the introduction of 
a normally distributed random effect on the scale of the 
linear predictor in logistic regression. For obvious reasons 
this distribution can be termed binomial-logitnormal. 
Unfortunately the mean and variance of the logit-normal 
distribution cannot be written in analytical form, and this 
is thus also the case for the binomial-logitnormal distri- 
bution itself Probabilities can be obtained by integrating 
out the random effect by Gauss-Hermite quadrature. 
Some examples of this distribution are given in Figure 2; 
the parameters which are used in this figure are such that 
the mean and variance of the distribution are given by nn 
and amn{l — n). This shows that for n = 0.5 there is 
hardly a difference with the beta-binomial distribution, 
while for smaller values of n the binomial-logitnormal 
distribution has somewhat smaller zero probabilities. 

Excess-Zeros Distributions 

Although the overdispersed count distributions have a 
larger zero probability than the corresponding Poisson or 
binomial distribution, in practice the number of zero 
observations can stiU be larger than predicted by the 
count distribution. This is termed excess-zeros or zero- 
inflation. Examples of situations with excess-zeros are 
given by Cunningham and Lindenmayer (2005), Sileshi 
(2008) and Lewin et al. (2010). Failure to account for 
zero-inflation in a statistical analysis may results in biased 
estimation of environmental effects of GM plants. A com- 



mon model for zero-inflation assumes that a proportion 
<5 of the experimental units have a structural zero and the 
remaining proportion (1 — S) of units follows one of the 
count distributions given above. The zero-inflated distri- 
bution for the resulting count Y is then given by 



P{Y = y) 



5 + {1 - d)P,{X = 0) y = 0 
{l-8)P,{X=y) y>0 



in which P^X = x) is the distribution of the counts. Note 
that the probability of observing a zero is given by the 
probability 5 of obtaining a structural zero plus the prob- 
ability of obtaining a zero by chance. Having a lot of 
zeros in itself does not necessarily mean that a zero- 
inflated model is needed. An examples of this is given by 
the negative binomial distribution in Figure 1. 

The mean of a zero-inflated Poisson distribution equals 
/i(l — (5) and its variance equals /i(l — 5){l + 3i.i). 
Regression models based on the zero-inflated Poisson dis- 
tributions were introduced by Lambert (1992) who con- 
sidered simultaneous modeling of /i and (5 which are 
related to possibly different sets of covariates. Greene 
(1994) brought regression modeling to the zero-inflated 
negative binomial distribution. Hall (2000) and Vieira 
et al. (2000) seem to be the first papers which employ a 
zero-inflated binomial model. Finally, Cheung (2006) uses 
a zero-inflated beta-binomial model to analyze cognitive 
function test scores of Indonesian children. 

Statistical Simulation Model 

Having defined possible probability distributions for 
counts and presence/absence data, the other elements of 
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Table 2. Elements of the statistical simulation model. 



Element 


Possible choices 


riic+rihi i+inn of i"ni in+c 


PriK<;rin n\/orHic"norcoH Pniccnn nonatn/o Kinrimial Prutrcrin-lnrinnrmal Kinnmi^l ho+^Kinnmi^l Kinnmi^l-lnnifnni'mal 
r uiiiui 1, uvfciiui3[Jtri itru r ui!>!>ui i, iitrudLivt: uiinjiiiiai, r ui jjui injuiiuiiiiai, Liiinjriiiai, utrLauiiiuiiiicii, uiiiuiii lai luy i li iui i i i oi 


Excess-zero counts 


No/yes 


Design 


Completely randomized, randomized blocks, number of replications 


Additional varieties 


Number of additional varieties or treatments in addition to the GM plant and its comparator 


Reference varieties 


Number of reference varieties which represent a population 


Trial 


Single trial, multiple trials, site x year trials 


Measurement 


Single time point, repeated measures (constant, linear or quadratic in time, autocorrelation) 


Parameters 


Parameter values for all the count distributions, for example, a mean and an excess-zero probability for each variety 



field testing of environmental effects of GM plants on 
NTOs can be introduced. These elements are summarized 
in Table 2. 

Block effects and the transformed scale 

An experiment will minimally consist of a GM plant and 
its comparator. When these are compared in a single 
trial without blocking, with only a single measurement 
per experimental unit and no excess-zeros, simulation of 
such an experiment only requires specification of the 
mean /i of the count distribution for both the GM plant 
and the comparator. In addition, when overdispersion is 
required, a common dispersion parameter must be speci- 
fied. Adding blocking to this experiment requires a ran- 
dom blocking effect. It is natural and common to 
introduce blocking effects for counts on the log scale, 
that is, 

log(/i) = (fixed variety effect) + (random blocking effect) 

as this ensures that the mean ji of the count is always posi- 
tive. Note that this also requires that the variety effect of the 
GM plant and the comparator is specified on the log scale. 
Likewise, the logit transformation can be used to specify the 
probability of success n for the presence/absence data and/or 
the excess-zeros probability S, for example, 

logit(7r) = \og(j^-^ 

= (fixed variety effect) 

+ (random blocking effect) 

This ensures that probabilities are always in the interval 
(0,1). So in the simulation model, all effects are intro- 
duced on the natural log scale for counts and on the logit 
scale for probabilities. 

When an experiment with blocking is required, it is 
assumed that block effects follow a normal distribution 
with some variance. For each new block, a block effect is 
simulated according to this distribution and added to 
the variety effect on the transformed scale. Note that 



when an excess-zero distribution is used, a block 
variance must be specified for both the count or pres- 
ence/absence distribution as well as for the excess-zero 
probabihty. 

Additional varieties and reference varieties 

In addition to the GM plant and the comparator, other 
varieties can be introduced in the experiment. These 
might be other comparators or other GM plants, or alter- 
natively, the GM plant and/or comparator itself which 
receive different agronomical treatments with for instance 
herbicides. Although the latter are not varieties but rather 
treatments, for simplicity, these are also termed additional 
varieties in the simulation model. For each of the addi- 
tional varieties, a variety effect on the transformed scale 
must be specified. 

A special case is an experiment in which the GM plant 
and its comparator are to be compared with a group of 
reference varieties which are assumed to have a history of 
safe use (Van der Voet et al., 2011). In this case, the indi- 
vidual reference varieties themselves are not of interest, 
but they are rather used to derive baselines or equivalence 
limits. The reference varieties in the experiment might 
thus be considered as representing a population of refer- 
ence varieties. It is then natural to assume that the variety 
effect of each reference variety is drawn from a statistical 
distribution. For convenience, a normal distribution is 
used for this. The difference between additional and refer- 
ence varieties is that for each additional variety, a fixed 
variety effect must be specified, whereas for reference 
varieties, only a common variety effect and an associated 
variance must be specified. 

At this stage, it is instructive to go through the differ- 
ent steps in the statistical simulation model for a hypo- 
thetical small experiment. Suppose a GM plant, its 
comparator and three reference varieties are to be com- 
pared in a randomized block experiment with 2 blocks. A 
zero-inflation Poisson distribution is used with the fol- 
lowing parameters for the varieties: 
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First the count effect on the log scale of the three refer- 
ence varieties is drawn from a normal (1,1) distribution; 
suppose that this results in random draws 0.8, 0.9 and 
1.2. Secondly, the excess-zero probability effect on the 
logit scale for the reference varieties is drawn from a 
normal (-0.8, 0.5) distribution; suppose that this results 
in -1.2, -0.9 and -0.6. Furthermore, assume that the 
between block variance for the counts equals 0.1 with 
random draws —0.4 and 0.1 for the two blocks and that 
the between block variance for the excess-zero probabili- 
ties equals 0.01 with random draws —0.1 and 0.2. The 
mean count ii and excess-zero probability S for every plot 
in the experiment are then given by 
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These values are then used to generate a count for each 
plot. In a simulation study, typically many datasets are 
simulated with the same settings. For each dataset, new 
reference variety effects and new blocking effects are sim- 
ulated, so only the bold values remain the same for each 
simulated dataset. 

Repeated measurements 

Non-target organisms at the same experimental unit are 
frequently sampled at different points in time, see for 
example Head et al. (2005). The simulation model 
assumes that time points are equidistant, that is, 1, 2, . . ., 
T. There are many possible patterns in time. The simula- 
tion model accommodates constant, linear or quadratic 
time patterns, all on the transformed scale, and this can 
be set separately for the mean of the counts and for the 
excess-zero probability. Moreover, the repeated observa- 
tions on the same experimental unit can be independent 



or can be correlated. The mean /i, of each variety at time 
point t, assuming no block effects, is given by log 
(/(f) = fp{t) + v„ where fp(t) is a polynomial up to order 
p = 2. The extra random effect Vf specifies the correlation 
between repeated measures. Absence of a time effect is 
simply given by /o(f) = /Jo, a linear time effect by 
flit) = Po + Pit and a quadratic time effect by 
flit) = Po+Pit+P2t^- An alternative parameterization for 
the second- order polynomial with more meaningful 
parameters is given by /2(f) = /J^^x - it -/?opt)^/(2Aoi)> 
where the minimum or maximum jSn,ax is attained for the 
optimal time point Popt and the parameter /?,oi represents 
the width of the quadratic curve, also called the tolerance. 
This latter parameterization is used in the simulation tool. 
For a positive tolerance, the parabola has a maximum, 
while for a negative tolerance, it has a minimum. The 
vector of random effects v = (vj, . . ., vy) is assumed to 
follow a multivariate normal distribution, that is, 
v~ MN(0, crj V) where V is a T x T symmetric correlation 
matrix. The simulation tool implements three options for 
the random effects: 

1 No extra variability as given by al = 0. 

2 Equal correlation across time by setting = 1 and 
Vj^i = p for k ^ I 

3 Autoregressive correlation across time by setting 

V„ = pl*^-'! 

Note that the second and third option involves an 
overdispersion mechanism. Consequently, an equal corre- 
lation across time with p = 0 combined with the Poisson 
distribution is equivalent with the Poisson-lognormal dis- 
tribution with no extra variability across time. A combi- 
nation with, for example, the negative binomial 
distribution however involves two levels of overdisper- 
sion. The simulation tool requires specification of the P 
parameters for the GM plant and its comparator and also 
for the additional varieties. When excess-zeros are desired, 
a different set of P parameters must be specified for the 
excess-zero probability model. For the reference varieties, 
each P parameter is drawn from a normal distribution 
with specified mean and variance. 

Multiple trials 

Multiple field trials across environments fall into two 
main classes. One in which there is no further structure 
across trials, for example, when trials are conducted at 
different sites, and secondly, when the trials follow a 
site X year structure. In the first case, there might be ran- 
dom trial effects such that trials will vary in their level of 
response without affecting differences between varieties. 
This is in addition to random block effects within trials. 
In the second case, experiments, carried out at different 
sites, are replicated for a limited number of years. Then, 
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random site and random year effects are conceivable as 
well as a random site by year interaction effect. Note that 
these random effects are added to the other effects on the 
transformed (log or logit) scale. 

In addition to additive multiple trial effects on the 
transformed scale, variety effects might be different from 
trial to trial, from site by site or from year to year. This 
can be termed genotype by environment interaction. This 
interaction can be accommodated by additional random 
effects which operate on variety effects. Instead of a fixed 
variety effect, for e.g. the GM plant, in each separate trial, 
the variety effect is drawn from a normal distribution 
with some mean and variance. For the reference varieties, 
there are two stages of simulation. In the first stage, a ref- 
erence mean, say M, for the trial is drawn from a normal 
distribution with specified mean and variance. In the sec- 
ond stage, the effects of the reference varieties in that spe- 
cific trial are simulated from a normal distribution with 
mean M and some other specified variance. Similarly for 
site X year trials, three variance components can be dis- 
tinguished for every variety effect. In case of repeated 
measurements with say a quadratic time effect, all the 
time effect /? parameters have their associated variance 
components and many parameters need to be specified. 
Moreover, the same statistical model can be specified for 
the excess-zero probabiHty. 

Examples of Simple Simulations 

The simulation model was implemented in a software 
tool, available as Supplementary Material, and was used to 
perform a series of simulations to demonstrate various 
aspects which can be studied by means of the tool. To this 
end, single, completely randomized trials were simulated 
to assess properties of statistical difference and equivalence 
testing. In addition to the GM plant and its comparator, 
one additional variety was included in every simulation 
and the simulated response was a count. The mean of the 
comparator and the additional variety was assumed to be 
equal to say /i, and the mean of the GMO is denoted by 
Oil such that 6 is the multiplicative difference between the 
GM plant and the comparator. The following levels of 
replication N were employed: N = 4, 6, 8, 10, 15, 20, 30 
and 40. A two-sided test with a significance level of 
a = 0.05 was used throughout. In each example, 1000 
datasets were simulated for every parameter combination. 

Power of difference test for the negative 
binomial distribution 

The power of a likelihood ratio test for the difference 
between a GM plant and its comparator was studied. 
Data were simulated according to the negative binomial 
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distribution. All combinations of the following values 
were used: 

/i 1, 2, 5, 10, 20, 40 

B 1, 1.2, 1.4, 1.6, 1.8, 2.0 

oj 0.25, 0.5, 1 

The negative binomial distribution was fitted to each 
dataset, first under the restriction that the mean of the 
GMO equals the mean of the comparator and secondly, 
without this restriction. A likelihood ratio test statistic is 
then given by twice the difference between the log likeli- 
hoods of the two models. The large sample distribution 
of this test statistic is /p and this distribution was used 
to calculate P values. The (simulated) power of the differ- 
ence test is then given by the fraction of the 1000 simu- 
lated datasets for which the null hypothesis of no 
difference is rejected. Examples of resulting power curves 
are given in Figure 3. The number of replications 
required to detect a multiplicative difference 6 = 2 
between the mean of the GM plant and the comparator 
with probability 0.80 is given in Table 3 for the various 
values of /( and oj. These values were interpolated from 
the values of N which are used in the simulation. As 
expected, the power is larger when there is less overdis- 
persion (smaller values of oj) and when the mean of the 
distribution is large. 

Sensitivity analysis for the negative 
binomial distribution 

The true underlying distribution of field count data is 
generally not known. Furthermore, especially for small 
samples, it is difficult to discriminate between the various 
statistical distributions. To assess the sensitivity to the 
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Figure 3. Power of a likelihood ratio difference test with a = 0.05 
for the negative binomial distribution with dispersion parameter to, a 
mean /( for the comparator and a mean O/i for the GM plant for 
replication levels W = 6 (black), 10 (red), 20 (green), and 40 (blue). 
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Table 3. Number of replications needed to obtain a significant differ- 
ence test witin probability 80% when the quotient of the mean of the 
GMO and the comparator equals 0 = 2 for data which have a nega- 
tive binomial distribution with mean /i for the comparator, mean 6>/i 
for the GM plant, and dispersion parameter co. 
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Figure 4. Power of a likelihood ratio difference test with a. = 0.05 
for negative binomial data with dispersion parameter co, a mean /i for 
the comparator, and a mean O/i for the GM plant for replication level 
W = 40 when analyzed employing a negative binomial model (black), 
a quasi-Poisson model (red), and a log transformation (green). 

assumed distributional form, each simulated negative 
binomial dataset was also analyzed by two alternative 
models. The first alternative employs the quasi-Poisson 
approach in which the likelihood ratio test was scaled by 
the mean deviance under the fiiU model, whenever this 
mean deviance is larger than 1. The second alternative 
first log transforms the count data, after the addition of 
0.5, whenever there is at least one zero in the data, fol- 
lowed by an analysis of variance. Examples of power 
curves for replication N = 40 for the three models are 
given in Figure 4. This seems to suggest that the power of 
the negative binomial and the quasi-Poisson analysis are 
very similar and that the power of the log-transform 
approach is somewhat smaller especially for the combina- 
tion of a large mean iJ. and a large dispersion parameter m. 

Properties of equivalence test for the 
Poisson distribution 

Properties of the TOST approach to equivalence testing 
were assessed for count data which were simulated 



according to the Poisson distribution. The null hypothesis 
of non-equivalence is rejected in favor of equivalence 
when the confidence interval completely lies in the inter- 
val determined by fixed lower and upper equivalence lim- 
its. The same simulation setting as in the first simulation 
was used However, as the Poisson distribution was 
employed to simulate data there is no over dispersion. 
Hypothetical equivalence limits of 'A and 2 were 
employed to perform equivalence testing. A 95% likeli- 
hood ratio confidence interval for the ratio of the GMO 
mean and the comparator mean was calculated for each 
simulated dataset. The number of times this interval lies 
within the equivalence interval ('A, 2) can then be 
counted. As an example, the confidence interval for 40 
simulated datasets is given in Figure 5 with i^i = 5 for 
both the GMO and the comparator, so 6=1, and for 
various values of the number of replications N. In this 
case, the GMO and comparator have equal means and are 
thus theoretically equivalent. However, for small numbers 
of replications, the confidence intervals frequently cross 
the equivalence limits implying that the null hypothesis of 
non-equivalence is not always rejected. 

The number of replications required to detects a quo- 
tient of <5 between the GM plant and its comparator, as 
well as the number of replications required to reject the 
null hypothesis of non-equivalence with equivalence limits 
of 'A and 2 is given in Table 4. This clearly shows the 
asymmetry in the requirements for a difference test and 
an equivalence test. As the multiplicative difference 9 
increases, the number of replicates for a difference test 
decreases, while those for an equivalence test increases. 

The effect of excess-zeros 

To evaluate the effect of excess-zeros on the power of the 
ordinary likelihood ratio test, a separate simulation with 
the excess-zero negative binomial distribution was exe- 
cuted. Again, a single trial without blocking with a single 
measurement was assumed. Furthermore, a multiplicative 
difference of 0 = 2 was used between the GM plant and 
the comparator. The excess-zero probability was set to 
5 = 0, 0.1, 0.2 and 0.5. The mean {l—5)ji of the excess- 
zero distribution was set to 1, 5 and 40 ensuring that the 
means of the distributions are identical for different val- 
ues of 5. The data were analyzed with the negative bino- 
mial distribution as if there were no excess-zeros. The 
power for different levels of replication is given in 
Figure 6. This indicates that for small means and small 
excess-zero probabilities, the power is not much affected. 
However, for larger means, there can be a considerable 
decline of the power. For an excess probability of (5 = 0.5 
and larger means, the resulting distribution has a spike at 
zero in combination with larger values with not very 
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Figure 5. 95% likelihood ratio confidence 
intervals for the ratio of the Poisson means of 
the GM plant and the comparator when the 
underlying mean of both is ^ = 5 and various 
numbers of replication W. The red vertical lines 
denote the artificial equivalence limits set at 
1/2 and 2. 



Table 4. Number of replications needed to obtain a significant differ- 
ence test or to reject the hypothesis of non-equivalence with limits 'A 
and 2, with probability 80% when the quotient of the mean of the 
GMO and the comparator equals 0 for data which have a Poisson 
distribution with mean \i for the comparator and mean 0\i for the 
GM plant. 
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much in between. In such a situation, the estimate of the 
dispersion parameter becomes very large so as to "catch" 
both the zeros and the larger observations. Consequently, 
the distinction between the means of the comparator and 
the GMO disappears resulting in very low power values. 
In such a case, the data should be analyzed by means of 
an excess-zero distribution. 

Repeated measurements 

In field studies, observations on the same experimental 
units are frequently carried out on different points in 
time. The question then arises whether it is more fruitful 
to increase the number of observations in time or to 
increase the number of experimental units to achieve a 



certain power. Consider the situation where the mean 
count of a species follows a second-order polynomial on 
the log scale with a maximum /i at f = 0 and a value of 
0.6/( at f = ± 7 days. This defines a second-order polyno- 
mial on the log scale with parameters Popt = 0, 
Ptoi = 47.96 and /?n,ax = log(/i). According to the polyno- 
mial, the mean count at f = ± 14 days equals 0.13/(. Data 
were simulated with the Poisson-lognormal model with a 
dispersion parameter co = 0.25. This was carried out for 
the situation in which the comparator has a maximum 
mean count of /( and the GM plant a maximum of 2/(, so 
6 = 2. Three situations are simulated: a) a single observa- 
tion for each experimental unit at f = 0, b) repeated 
observations at f = —14, —7, 0, 7 and 14 assuming that 
these observations are independent and c) repeated obser- 
vations at the same five points in time but now assuming 
an autoregressive correlation across time for the five ran- 
dom effects on the log scale with correlation p = 0.8. 
Independent observations in time are included as a limit- 
ing case because in practice, it is unlikely that repeated 
observations on the same experimental unit are indepen- 
dent. In cases b) and c), observations on the five time 
points are summed for every experimental unit, and in all 
cases, the negative binomial distribution, which has the 
same variance function as the Poisson-lognormal, was 
used for analyzing the data. The power curve of the 
difference test for various replication levels is given in 
Figure 7. For /i = 1, there is a large benefit of repeated 
measurements for both the independent and the corre- 
lated counts in time. This benefit becomes smaller for lar- 
ger ji for the correlated counts, possibly to a point that it 
is more efficient to increase the replication level rather 
than sampling at different points in time. This will how- 
ever also depend on the size of the correlation. Only a 
minority of the papers listed in Table 1 explicitly consid- 
ered this issue in analyzing data from field experiments. 
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Figure 6. Power of a likelihood ratio difference test with y. = 0.05 for negative binomial data with overdispersion parameter co = 0.25 and 
additional excess-zeros with probability (5 = 0 (black), 0.1 (red), 0.2 (blue), and 0.5 (green). The comparator has mean /i(1 - S), and the GM plant 
has a mean of 2/i(1 - S). 




Figure 7. Power of a likelihood ratio difference test for Poisson-lognormal data with overdispersion parameter to = 0.25 and a single observation 
(black), the sum of 5 independent observations (red), and the sum of 5 dependent observations (blue). The mean of both the comparator and the 
GM plant follows a quadratic polynomial on the log scale with a maximum mean count of /i for the comparator and Iji for the GM plant (see 
text). 



Table 5. Number of replications needed to reject the hypothesis of 
non-equivalence with limits 1/3 and 3, with probability 80% using a 
two-sided test for the repeated measurements simulation, see text. 
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The confidence interval for the difi^erence parameter on 
the log scale was used to perform equivalence testing. The 
number of replications required to reject the null hypoth- 
esis of non-equivalence with equivalence limits of 1/3 and 
3 is given in Table 5. This also shows that, for the corre- 
lation case, the advantage of multiple sampling times 
becomes smaller as the mean increases. 

Discussion 

This paper describes a general fi-amework for simulating 
data typically encountered in environmental risk assess- 
ment of genetically modified plants. It focuses on field 



testing of GM plants to assess and compare their poten- 
tial adverse effects on non-target organisms. Based on 
such field trials, the assessment includes the use of statis- 
tical difference and equivalence testing. It is important to 
know the statistical properties of such tests, for example, 
the power and robustness of a test and whether the test 
has the correct significance level. The EFSA guideline 
(EFSA, 2010) for environmental risk assessment of GM 
plants for instance requires that each field trial should 
have sufficient replication to be able to yield a stand- 
alone analysis if required. Such and other statistical issues 
related to the assessment can best be researched by means 
of a statistical simulation model. 

Limits of concern are required for equivalence testing, 
but these may be difficult to specify for NTO experi- 
ments. They can of course be set to a fixed value by 
authorities, such as a 20% increase or decrease (Hothorn 
and Oberdoerfer, 2006), but this remains largely arbitrary. 
Alternatively, a set of reference varieties can be included 
in the same comparative experiment to allow comparison 
with the GM plant grown under the same conditions 
(van der Voet et al, 2011). The natural variation, in, for 
example, counts of NTOs, between the reference varieties 
can then be used to set limits of concern. 

The simulation model described in this paper can be 
used to generate data for various endpoints having 
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different statistical distributions. Typical environmental 
data of non-target organisms are counts or presence/ 
absence data. The basic distribution for counts is the 
Poisson distribution, while presence/absence data are 
commonly modeled by the binomial distribution. Taylor's 
power law (Taylor, 1961) suggests that overdispersion, 
relative to the Poisson distribution, is likely to be normal 
in environmental studies. Statistical distributions for this 
phenomena are the overdispersed Poisson, negative bino- 
mial and Poisson-lognormal distribution for counts; these 
have much wider tails and also a larger zero probability. 
Likewise, overdispersed distributions for binomial data 
are the beta-binomial and the binomial-logitnormal dis- 
tribution, which seem to be quite similar for a wide range 
of parameter values. In addition to overdispersion, 
excess-zeros might frequently be encountered especially 
for rather rare species (Cunningham & Lindenmayer, 
2005). Failure to account for this in the analysis may 
result in biased estimation of ecological effects (Sileshi, 
2008). The simulation model therefore also addresses 
zero-inflation for every distribution just described. 

Field trials for environmental risk assessment may be 
too small to discriminate between the various statistical 
distributions. For example, a large number of zeros can 
be due to structural zeros and thus zero-inflation or can 
be due to heavy clumping of individuals which gives rise 
to a negative binomial distribution with a large dispersion 
parameter. The simulation model can be used to assess 
the robustness of statistical models to digressions from 
the model. For instance, an analysis according to Taylor's 
power law, or possible an analysis of log counts using the 
normal distribution, might be a good comprise for the 
various overdispersed count distributions. This can also 
be researched by means of the simulation model. 

In many experiments, non-target organisms are sam- 
pled at different points in time. The simulation model 
accommodates this by the option to specify a constant, 
linear or quadratic pattern in time on the log- or logit- 
transformed scale. Moreover, repeated measures at the 
same plot might exhibit autocorrelation which can be 
modeled by a random effect with equal correlation across 
time or by an autoregressive process. This gives the 
opportunity to study various endpoints such as the 
summed counts over a specified period of time, the time 
of first occurrence of a species, the time at which maxi- 
mal abundance of a species occurs, or even the repeated 
measures themselves. 

Genotype by environment interaction is an important 
issue in the admission of GM plants. In the European 
context, the EFSA guideline (EFSA, 2010) also focuses on 
the receiving environment of the GM plant, assuming 
that different bio-geographical zones, in terms of meteo- 
rological, ecological and agricultural conditions, may 
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comprise different risks of growing GM plants. To 
develop appropriate statistical methods to handle geno- 
type by environment interaction in studies over multiple 
bio-geographic regions and under varying agronomical 
conditions, a simulation tool is indispensable. The inter- 
action is accommodated by the simulation model by 
additional random effects which operate on variety 
effects. Instead of a fixed variety effect, for e.g. the GM 
plant, in each separate trial (or environment), the variety 
effect, on the transformed scale, is drawn from a normal 
distribution with some mean and variance. This ensures 
that the difference between the GM plant and its compar- 
ator does have a common basis across trials, but might be 
varying from trial to trial. As an alternative, the simula- 
tion model also caters for a site by year interaction in 
which the "environments" are now structured by sites 
and years. 

All these elements are implemented into a single com- 
puter program which can handle non-normal distribu- 
tions for counts and absence/presence data possibly with 
excess-zeros, accommodates reference varieties in addition 
to the GM plant and its comparator, employs completely 
randomized or randomized block experiments, enables 
genotype by environment interaction by adding random 
variety effects, and finally repeated measures in time fol- 
lowing a constant, linear, or quadratic pattern in time 
possibly with some form of autocorrelation. The com- 
puter program is available as Supplementary Material to 
this paper. 

Although the tool is quite comprehensive, there are 
certain restrictions. It is not possible to define different 
dispersion parameters for different varieties or for differ- 
ent trials. This might be useful when one would like to 
keep the coefficient of variation constant across varieties 
rather than the dispersion parameter. Simulation by 
means of Taylor's power law is not implemented as there 
is no distribution associated with this law. In the current 
simulation model, it is not possible to have a linear effect 
in time for one variety and a quadratic time effect for 
another variety, but if needed, pseudo-linear behavior can 
be obtained by specific choices of the quadratic parame- 
ters. Also the time dependence is assumed to be equal 
across varieties. Split-plot experiments, with for example 
agronomical treatments on the main plot level and varie- 
ties on the subplot level, are not supported. 

The tool was used to perform a series of simple preli- 
minary simulations to demonstrate various aspects which 
can be studied by means of the tool. It is shown that the 
tool can be used for a prospective power analysis for both 
difference testing and equivalence testing. A simple sensi- 
tivity analysis seems to imply that, for simple experi- 
ments, the assumed distributional form may not be very 
critical. The effect of a small proportion of excess-zeros 
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might be small when the mean count itself is also small. 
Finally, it appears that the benefit of repeated measures, 
assuming autoregressive correlation in time, becomes 
smaller when the mean increases. 

Based on the simulation model as described in this 
paper, further work is needed to develop a protocol for 
prospective power analysis when designing field experi- 
ments for environmental non-target effects of GM plants. 
Requiring the use of such a protocol could avoid litera- 
ture reports where the conclusion of non-significant dif- 
ferences between GM and non-GM plants is not 
accompanied by a report on the statistical power of the 
field experiment. 
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